Perfect Path Phylogeny Haplotyping with Missing Data Is Fixed-Parameter Tractable

نویسندگان

  • Jens Gramm
  • Till Nierhoff
  • Till Tantau
چکیده

Haplotyping via perfect phylogeny is a method for retrieving haplotypes from genotypes. Fast algorithms are known for computing perfect phylogenies from complete and error-free input instances—these instances can be organized as a genotype matrix whose rows are the genotypes and whose columns are the single nucleotide polymorphisms under consideration. Unfortunately, in the more realistic setting of missing entries in the genotype matrix, even restricted forms of the perfect phylogeny haplotyping problem become NP-hard. We show that haplotyping via perfect phylogeny with missing data becomes computationally tractable when imposing additional biologically motivated constraints. Firstly, we focus on asking for perfect phylogenies that are paths, which is motivated by the discovery that yin-yang haplotypes span large parts of the human genome. A yin-yang haplotype implies that every corresponding perfect phylogeny has to be a path. Secondly, we assume that the number of missing entries in every column of the input genotype matrix is bounded. We show that the perfect path phylogeny haplotyping problem is fixed-parameter tractable when we consider the maximum number of missing entries per column of the genotype matrix as parameter. The restrictions we impose are met by a majority of the problem instances encountered in publicly available human genome data.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Haplotyping with missing data via perfect path phylogenies

Computational methods for inferring haplotype information from genotype data are used in studying the association between genomic variation and medical condition. Recently, Gusfield proposed a haplotype inference method that is based on perfect phylogeny principles. A fundamental problem arises when one tries to apply this approach in the presence of missing genotype data, which is common in pr...

متن کامل

Influence of Tree Topology Restrictions on the Complexity of Haplotyping with Missing Data

Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that explains the observed data. Unfortunately, when data entries are missing, as is often the case in real laboratory data, the resulting formal problem IPPH, which...

متن کامل

Phylogeny- and Parsimony-Based Haplotype Inference with Constraints

Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast computational haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that explains the observed data. In their CPM’09 paper, Fellows et al. studied an extension of this approach that incorporates prior knowledge in the f...

متن کامل

Phylogeny- and Parsimony-Based Haplotype Inference with Constraints1

Haplotyping, also known as haplotype phase prediction, is the problem of predicting likely haplotypes based on genotype data. One fast computational haplotyping method is based on an evolutionary model where a perfect phylogenetic tree is sought that explains the observed data. An extension of this approach tries to incorporate prior knowledge in the form of a set of candidate haplotypes from w...

متن کامل

On the Complexity of SNP Block Partitioning Under the Perfect Phylogeny Model

Recent technologies for typing single nucleotide polymorphisms (SNPs) across a population are producing genome-wide genotype data for tens of thousands of SNP sites. The emergence of such large data sets underscores the importance of algorithms for large-scale haplotyping. Common haplotyping approaches first partition the SNPs into blocks of high linkage-disequilibrium, and then infer haplotype...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004